(不專業的AI介紹) 機器學習-Machine-Learning -> Machine Learning Explainability for Santander Day 25

第 11 屆 iThome 鐵人賽

DAY 25

AI & Data

AI&Machine Learning系列第 25 篇

(不專業的AI介紹) 機器學習-Machine-Learning -> Machine Learning Explainability for Santander Day 25

11th鐵人賽

ken36789

團隊Turing World

2019-10-11 16:42:23

1134 瀏覽

分享至

本篇將會針對 Kaggle 裏頭 https://www.kaggle.com/mjbahmani/santander-ml-explainability 文章來做為介紹，本文章是在講述將一個銀行上的資料做一個 Machine-Learning 資料彙整，以及利用圖表看出一些差異性，在裏頭其實有講出很多分類的方式 In this section, I want to try extract insights from models with the help of this excellent Course in Kaggle，他們告訴我們裏頭有個課程可以參考有興趣的人也可以進一步了解 https://www.kaggle.com/learn/machine-learning-explainability 那我針對本文章會利用一種的方式來做為介紹。

cols=["target","ID_code"]
X = train.drop(cols,axis=1)
y = train["target"]

X_test  = test.drop("ID_code",axis=1)
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
rfc_model = RandomForestClassifier(random_state=0).fit(train_X, train_y)

import eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(rfc_model, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm, feature_names = val_X.columns.tolist(), top=150)

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
tree_model = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_split=5).fit(train_X, train_y)
For the sake of explanation, I use a Decision Tree which you can see below.

features = [c for c in train.columns if c not in ['ID_code', 'target']]
from sklearn import tree
import graphviz
tree_graph = tree.export_graphviz(tree_model, out_file=None, feature_names=features)
graphviz.Source(tree_graph)

from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots


pdp_goals = pdp.pdp_isolate(model=tree_model, dataset=val_X, model_features=features, feature='var_81')


pdp.pdp_plot(pdp_goals, 'var_81')
plt.show()

pdp_goals = pdp.pdp_isolate(model=tree_model, dataset=val_X, model_features=features, feature='var_82')


pdp.pdp_plot(pdp_goals, 'var_82')
plt.show()

pdp_goals = pdp.pdp_isolate(model=tree_model, dataset=val_X, model_features=features, feature='var_139')


pdp.pdp_plot(pdp_goals, 'var_139')
plt.show()

pdp_goals = pdp.pdp_isolate(model=tree_model, dataset=val_X, model_features=features, feature='var_110')


pdp.pdp_plot(pdp_goals, 'var_110')
plt.show()

在這先謝謝本文章的介紹 https://www.kaggle.com/mjbahmani/santander-ml-explainability 請各位可以多多支持以及觀看，以上的介紹，先建立一個完善的陣列框架，接下來利用ML技術將資料進行分裝並且訓練以及測試，最後在測試階段時，因為要看出資料的不同所以利用圖表方式來表示，這幾段程式告訴了我們什麼事情？其實 Machine-Learning 的作法上遠超越我們的想像，有時候不是實體做出來的才是完美的機器學習，我們如果只是做數據上的分析，其實也是機器學習的一種，希望大家也可以參照此文章可以體會到機器學習不是我們想像的那麼小，隨時隨地想要做出機器學習的動機非常的多，處處都是練習的機會，希望大家也能更上一層。

謝謝大家觀看完文章，以上為不專業的AI介紹，那我們下篇見~~~~~